Serveur d'exploration sur l'OCR

Attention, ce site est en cours de développement !
Attention, site généré par des moyens informatiques à partir de corpus bruts.
Les informations ne sont donc pas validées.

Date of Birth Extraction Using Precise Shallow Parsing

Identifieur interne : 000790 ( Main/Exploration ); précédent : 000789; suivant : 000791

Date of Birth Extraction Using Precise Shallow Parsing

Auteurs : Ray Pereda [États-Unis] ; Kazem Taghva [États-Unis]

Source :

RBID : Pascal:10-0429703

Descripteurs français

English descriptors

Abstract

This paper presents the implementation and evaluation of a pattern-based program to extract date of birth information from OCR text. Although the program finds data of birth information with high precision and recall, this type of information extraction task seems to be negatively impacted by OCR errors.


Affiliations:


Links toward previous steps (curation, corpus...)


Le document en format XML

<record>
<TEI>
<teiHeader>
<fileDesc>
<titleStmt>
<title xml:lang="en" level="a">Date of Birth Extraction Using Precise Shallow Parsing</title>
<author>
<name sortKey="Pereda, Ray" sort="Pereda, Ray" uniqKey="Pereda R" first="Ray" last="Pereda">Ray Pereda</name>
<affiliation wicri:level="2">
<inist:fA14 i1="01">
<s1>Information Science Research Institute University of Nevada, Las Vegas</s1>
<s2>Las Vegas, NV 89154-4021</s2>
<s3>USA</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
</inist:fA14>
<country>États-Unis</country>
<placeName>
<region type="state">Nevada</region>
</placeName>
</affiliation>
</author>
<author>
<name sortKey="Taghva, Kazem" sort="Taghva, Kazem" uniqKey="Taghva K" first="Kazem" last="Taghva">Kazem Taghva</name>
<affiliation wicri:level="2">
<inist:fA14 i1="01">
<s1>Information Science Research Institute University of Nevada, Las Vegas</s1>
<s2>Las Vegas, NV 89154-4021</s2>
<s3>USA</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
</inist:fA14>
<country>États-Unis</country>
<placeName>
<region type="state">Nevada</region>
</placeName>
</affiliation>
</author>
</titleStmt>
<publicationStmt>
<idno type="wicri:source">INIST</idno>
<idno type="inist">10-0429703</idno>
<date when="2010">2010</date>
<idno type="stanalyst">PASCAL 10-0429703 INIST</idno>
<idno type="RBID">Pascal:10-0429703</idno>
<idno type="wicri:Area/PascalFrancis/Corpus">000162</idno>
<idno type="wicri:Area/PascalFrancis/Curation">000615</idno>
<idno type="wicri:Area/PascalFrancis/Checkpoint">000164</idno>
<idno type="wicri:doubleKey">0277-786X:2010:Pereda R:date:of:birth</idno>
<idno type="wicri:Area/Main/Merge">000795</idno>
<idno type="wicri:Area/Main/Curation">000790</idno>
<idno type="wicri:Area/Main/Exploration">000790</idno>
</publicationStmt>
<sourceDesc>
<biblStruct>
<analytic>
<title xml:lang="en" level="a">Date of Birth Extraction Using Precise Shallow Parsing</title>
<author>
<name sortKey="Pereda, Ray" sort="Pereda, Ray" uniqKey="Pereda R" first="Ray" last="Pereda">Ray Pereda</name>
<affiliation wicri:level="2">
<inist:fA14 i1="01">
<s1>Information Science Research Institute University of Nevada, Las Vegas</s1>
<s2>Las Vegas, NV 89154-4021</s2>
<s3>USA</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
</inist:fA14>
<country>États-Unis</country>
<placeName>
<region type="state">Nevada</region>
</placeName>
</affiliation>
</author>
<author>
<name sortKey="Taghva, Kazem" sort="Taghva, Kazem" uniqKey="Taghva K" first="Kazem" last="Taghva">Kazem Taghva</name>
<affiliation wicri:level="2">
<inist:fA14 i1="01">
<s1>Information Science Research Institute University of Nevada, Las Vegas</s1>
<s2>Las Vegas, NV 89154-4021</s2>
<s3>USA</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
</inist:fA14>
<country>États-Unis</country>
<placeName>
<region type="state">Nevada</region>
</placeName>
</affiliation>
</author>
</analytic>
<series>
<title level="j" type="main">Proceedings of SPIE, the International Society for Optical Engineering</title>
<title level="j" type="abbreviated">Proc. SPIE Int. Soc. Opt. Eng.</title>
<idno type="ISSN">0277-786X</idno>
<imprint>
<date when="2010">2010</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
<seriesStmt>
<title level="j" type="main">Proceedings of SPIE, the International Society for Optical Engineering</title>
<title level="j" type="abbreviated">Proc. SPIE Int. Soc. Opt. Eng.</title>
<idno type="ISSN">0277-786X</idno>
</seriesStmt>
</fileDesc>
<profileDesc>
<textClass>
<keywords scheme="KwdEn" xml:lang="en">
<term>Document retrieval</term>
<term>High precision</term>
<term>Implementation</term>
<term>Information extraction</term>
<term>Optical character recognition</term>
<term>Pattern recognition</term>
<term>Syntactic analysis</term>
</keywords>
<keywords scheme="Pascal" xml:lang="fr">
<term>Reconnaissance forme</term>
<term>Recherche documentaire</term>
<term>Analyse syntaxique</term>
<term>Implémentation</term>
<term>Reconnaissance optique caractère</term>
<term>Précision élevée</term>
<term>Extraction information</term>
<term>0130C</term>
<term>4230S</term>
</keywords>
<keywords scheme="Wicri" type="topic" xml:lang="fr">
<term>Recherche documentaire</term>
</keywords>
</textClass>
</profileDesc>
</teiHeader>
<front>
<div type="abstract" xml:lang="en">This paper presents the implementation and evaluation of a pattern-based program to extract date of birth information from OCR text. Although the program finds data of birth information with high precision and recall, this type of information extraction task seems to be negatively impacted by OCR errors.</div>
</front>
</TEI>
<affiliations>
<list>
<country>
<li>États-Unis</li>
</country>
<region>
<li>Nevada</li>
</region>
</list>
<tree>
<country name="États-Unis">
<region name="Nevada">
<name sortKey="Pereda, Ray" sort="Pereda, Ray" uniqKey="Pereda R" first="Ray" last="Pereda">Ray Pereda</name>
</region>
<name sortKey="Taghva, Kazem" sort="Taghva, Kazem" uniqKey="Taghva K" first="Kazem" last="Taghva">Kazem Taghva</name>
</country>
</tree>
</affiliations>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Ticri/CIDE/explor/OcrV1/Data/Main/Exploration
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 000790 | SxmlIndent | more

Ou

HfdSelect -h $EXPLOR_AREA/Data/Main/Exploration/biblio.hfd -nk 000790 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Ticri/CIDE
   |area=    OcrV1
   |flux=    Main
   |étape=   Exploration
   |type=    RBID
   |clé=     Pascal:10-0429703
   |texte=   Date of Birth Extraction Using Precise Shallow Parsing
}}

Wicri

This area was generated with Dilib version V0.6.32.
Data generation: Sat Nov 11 16:53:45 2017. Site generation: Mon Mar 11 23:15:16 2024